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Abstract 

Repeated expansion of hexanucleotide in C9ORF72 encodes the protein 
Guanine Nucleotide Exchange considered as the main cause of Amyotrophic 
Lateral Sclerosis (ALS). The repeated expansion produces toxic products and 
autophagy deficits. Various in silico approaches were employed for structural 
3D modeling and protein-protein molecular docking analyses of COORF72 
followed by virtual screening. Homology modeling and threading approaches 
were applied to predict the 3D structures of COORF72 and 92.38% of quality 
factor was calculated by ERRAT evaluation tool. STRING database was 
utilized, and it was observed that SMCR8 has the ability to be the interacting 
partner of COORF72. Protein-protein molecular docking analyses of COORF72 
with SMCR8 were performed and potential interacting residues were observed 
computationally. FDA library from ZINC database was utilized for virtual 
screening and comparative molecular docking analyses were performed by 
AutoDock Vina. It was proposed that the scrutinized compound ZINC131 have 
strong binding affinities and least binding energy of -11.3 kcal/mol. The 
suggested molecule may be used for further analyses in the drug discovery 
processes. The predicted 3D structure of C9ORF72 provides the structural 
insights for the better functional understanding of C9ORF72. Overall, the 
findings of present work may be helpful in designing the novel therapeutic 
targets against COORF72. 


This work is licensed under the Creative Commons Attribution Non- 
Commercial 4.0 International License. 
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Introduction 


Amyotrophic Lateral Sclerosis (ALS) was first 
described by Charcot in 1874 [1]. The cause of 
ALS/Lou Gehrig disease is not completely known and 
considered as spontaneously arise [2]. In ALS, upper 
motor neurons and lower motor neurons degenerate 
and die (motor neurons link between the brain and 
voluntary muscles) [3]. ALS affects the nerve cells in 
the brain and spinal cord. The brain lost the ability to 
initiate and control the movement of muscles [4]. The 
gradual decline in strength leads to paralysis of more 
and more muscles leads to death. The genetic cause of 
ALS includes mutation in C9ORF72, FUS, SODI, 
VCP and TDP-43 [5]. Most of the cases for ALS are 
sporadic and about 25% of people were suffering for 
ALS due to family history [6]. The sporadic ALS are 
usually to the patients between the age of 55-65 years 
old and only 5% patients are <30 years old. Males and 
females are equally affected by ALS. Juvenile ALS 
(JALS) is a term used for patients below the age of 25 
years [7]. 

Most of the time ALS is autosomal recessive while 
dominant inheritance linked with chromosome 9 [8]. 
ALS is a motor neuron disease characterized by 
muscle twitching [9], muscle stiffness and muscle 
weakness due to a decrease in muscle size. In most 
cases, patient lose the ability to speak, walk, breathe, 
swallow and hand movement [10, 11]. Some patients 
face difficulties in thinking and behavioral acts [12]. 
ALS has no cure yet however early diagnoses may 
help to treat and keep the muscle control little longer 
[13]. The death of ALS patient cause within 3 to 4 
years due to respiratory failure [14]. ALS is related to 
parkinsonism and dementia [15]. There is no specific 
cure of ALS, moreover Riluzole have considerable 
relax to the patients [16]. Another drug approved by 
FDA for ALS is Edaravone [17]. 

C9ORF72 is localized on chromosome 9 [18]. 
Augmentation of GGGGCC hexanucleotide repeat 
extensively present in C9ORF72 considered as the 
common cause of ALS [19]. A repetition of 
hexanucleotides is considered as the genetic cause of 
almost 10% patients [20]. Guanine nucleotide 
exchange CIORF72 is encoded by C9ORF72 [21]. 
The structure of C9ORF72 is not known yet. 
CIYORF72 is present on the short arm of the 
chromosome (p) in humans [22]. The length of the 
sequence is from 27,546,542 base pairs to 27,573,863 
base pairs. The protein guanine nucleotide exchange 
C9ORF72 is present in most of the regions of the 
brain, presynaptic terminals and in the cytoplasm of 
neurons [23, 24]. In COORF72, there are fewer repeats 
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of hexanucleotide GGGGCC as <30 normally 
however in patients of ALS, these repeats are in 
hundreds [25, 26]. These repeats decrease the 
autophagy regulator of protein C9ORF72 alters the 
expression leads to ALS. The lack of COORF72 might 
be the cause of disease. COORF72 emerged in most of 
the eukaryotes and have single copy of gene encoding 
C9ORF72 [27]. The presence of four nucleotides of 
Guanine and two of Cytosine noncoding part 
repetition cause severe kind of mutational changes 
leads to ALS [28, 29]. In COORF72 hexanucleotide 
repetition can cause RNA toxicity through the 
confiscate and collection of RNA binding protein. The 
guanine nucleotide exchange protein has two 
isoforms; one has the sequence length of 481 amino 
acids while the other has 222 amino acids. Repeated 
expansion cause mutation in COORF72 leads to ALS. 
The neuronal function of C9ORF72 is unknown. 
C9ORF72 has structural homology with Differentially 
Expressed in Normal Neoplasia DENN [30]. 

During the last two decades, the number of known 
protein sequences has increased as compared to 
structures [31]. This unbalance between the protein 
sequence and structure has censoriously limited the 
ability to understand the molecular mechanism of 
proteins [32]. The structure formation rate of known 
protein is much slower as the structure prediction 
techniques (X-Ray Crystallography and NMR) are 
time consuming [33]. The development of structural 
bioinformatics helps to solve the biological 
macromolecules (DNA, RNA and protein) structural 
analyses [34]. There have been many achievements in 
computational drug designing and personalized 
medicine [35, 36]. Various possibilities are present to 
understand neurological disease which plays an 
important role in the medical field [37-39]. The focus 
of current work was to 3D structure prediction, 
evaluation and validation of Guanine Nucleotide 
Exchange C9ORF72 followed by protein-protein 
molecular docking and virtual screening. 


Materials and Methods 


The C9ORF72 have accession number Q96LT7 in 
Uniport Knowledge Base. In this work, 3D structure 
prediction, virtual screening and molecular docking 
analyses were performed. 

The amino acid sequence (FASTA sequences) of 
Guanine nucleotide exchange (C9ORF72 was 
retrieved from Uniport KB (http://www.uniprot.org/) 
[40]. The sequence was subjected to BLAST» for the 
selection of a suitable template against Protein Data 
Bank (PDB) [41, 42]. The automated program 
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MODELLER 9.20 [43] for homology modeling was 
used to predict the 3D structures of C9ORF72 by 
spatial restraints. The online tools including l- 
TASSER [44], RaptorX [45], CPHModel [46], 
HHpred [47], Phyre2 [48], SWISS-MODEL [49], 
MOD-WEB [50], Robetta [51], Sparks-X [52], 3D- 
Jigsaw [53] and ESyPred 3D [54] were also used to 
predict the protein structure. The 3D structures of 
C9ORF72 was visualized by UCSF Chimera 1.13.1 
[55] and PyMOL [56]. UCSF Chimera also used to 
minimize selected structure. Rampage [57], Anolea 
[58], ProCheck [59] and ERRAT [60] evaluations 
tools were used to evaluate the quality of the model of 
protein structure. The produced Ramachandran plots 
for the assessment of predicted structures showed 
residues dispensation and also declared ọ and yw 
distribution of non-proline and non-glycine residues. 
For the differentiation of favorable and non-favorable 
regions, phi and psi angles were plotted against each 
other. These angles were used for the assessment of 
different regions. ERRAT evaluation tool was also 
used to calculate the quality factors of predicted 
structures [61]. 

To determine the functional interacting partner of 
target protein, STRING (Search tool for the retrieval 
of interacting genes/proteins) [62] and STITCH 
(Search Tool Interacting Chemical) [63] were used. 
The online server PatchDock [64, 65] was used for 
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protein-protein molecular docking and FireDock [66] 
was used for the refinement and scoring of protein- 
protein docking solutions. Gramm-X was also 
employed for protein-protein docking analysis for the 
cross validation of the analyses. LigPlot [67, 68] was 
utilized to analyze the hydrophobic and electrostatic 
interaction and also used to generate schematic 
diagrams of protein-protein interactions. 

PyRx [69] software was used to dock the small 
molecules with macromolecule and virtual screening. 
The blind docking was proceeded to analyze the 
protein and ligand interaction for confirmation and 
orientation. The FDA library of Zinc database [70] 
was retrieved for virtual screening against the target 
protein [71]. 


Results and Discussion 


The study of neurology and structural bioinformatics 
are fields of exploring knowledge and providing an 
effective way for better understanding and 
development of different research approaches for the 
diagnosis, cure and detection of neuronal diseases 
including ALS. The ensemble genome browser was 
used to locate the exact position of COORF72 protein- 
coding gene in humans (Fig. 1). 
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It has been observed from extensive literature review 
that Guanine Nucleotide Exchange protein has no 
other reported member in family. The multiple 
sequence alignment (MSA) was performed for two 
isoforms of C9ORF72 carried out by Clustal omega 
[72]. An asteric(*) indicated positions which have 
single, fully conserved residues, colon indicated the 
similar residues and dot indicated the weakly similar 
residues. The positions with no dot indicated non- 
conserved residues (Fig. 2). 

Coils, Protparam [73] tools were used to calculate the 
physiochemical properties of the receptor protein. The 
molecular weight of the protein based on average 
isotope masses of the amino acids was also studied. 
The theoretical PI was 5.82 depends on side chains 
determined the pH of the protein. The half-life of the 
protein was calculated 30 hours in vitro. The aliphatic 
index was -0.069, instability index was 50.54 and the 
number of positively charged residues were 50 as well 


ae ee Ee 


Fig. 1: The presence of gene C9orf72 on the position of chromosome number 8. 


as negatively charged residues with total number of an 
atom was also calculated 60 (Table 1) (Fig. 3). 

The PONDR [74] tool was used to predict the ratio of 
natural disorders caused by the mutation in COORF72. 
The graph showed the composition of order and 
disorder. The center line was the threshold and peak 
above the threshold identified as disorder while the 
line below the threshold identified as an order of given 
protein COORF72 (Fig. 4). 


Structure Prediction 
The 3D structure of COORF72 was not reported by X- 


Ray crystallography and NMR yet. The comparative 
modeling and threading approaches 
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CLUSTAL O(€1.2.4) multiple 


sp|Q96LT?7|CIe72_HUMAN 
sp|Q96LT?7-2|Cle72_HUMAN 


sp|Q96LT?7|CIe72_HUMAN 
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sequence alignment 


MSTLOPPPSPAVAKTETALSGRESPLLAATFAYWDNILGPRYRHIWAPKTEQVLLSDGEIT 
MS TLOPPPSPAVAKTETALSGRSPLLAATFAYWDNILGPRYVRHIWAPKTEQOQVLLSDGEIT 


THR TE Oe SRC Sc CC CSCS Sc SCC SC Sc SCC SCC Sac SCC Sc SCC SS SC SCC CC SC CC Sc o l o 





FLANHTLWGOETLRNAESGATDVEKFFYLSERGYVIIVS_LIFDGNWNGDRSTYGLSIITLPOTE 
FLANHTLWGETLRNAESGATDVEKFFEYLSEKRGYVIIVSLIFDGNWNGDRSTYGLSIITILPOTE 


DE RBC ORC RC SARC SSSR SR SARC SSS SARS SS SS SSS Sc SS Sc SS SS SS Sc a o 


LSFYLPLHRYCVDRLTHITREGRIWMHKERQENVOKITLEGTERMEDQOGOSITIIPMLTGEVY 
LS FYLPLHRYCVDRLTHITREGRIWMHKERQENVOKITLEGTERMEDQOGOSTIIPMLTGEVY 


D RBC RRC SARC SARC SC SCS RC SRC SARC SSS CR SAR SS SSR SR RS SSS Sc SSS SS SS a E E E E E SS a a o 





IPVMELLSSMESHSVPEETDIADTYLNDDDIGDS CHEGFLLNATSSHLOTCOCS VW VWGsS5S 
IPVMELLSSMEKSHSVPEETDIADTYLNDDDIGDS CHEGFLLE------------------ 


Sa Tac Sac Cac Sac OC ec Sc Sc Sc CS Sic Sc Sc Se Sc Sic cc Sc Sc Sc SC Sc Sc E SCS CSc Sc Sc SC Sc C$ 


AEFKVNEKIVRTLCLELTPAERKCSRLCEAESSFREYESGLFEYVOGLLEDSTGSFEFYVYLPFROVETY 








F 4351 
— 222 


126 
126 


136 
13806 


240 
222 


3206 
222 


360 
222 


4276 
222 


4386 
222 


Fig. 2: Alignment retrieved from the Clustal Omega of the related protein of mouse and bovine with a human which 
shows residues with * (identical) and: (Somewhat similar). 





Fig. 3: Pie chart representation of amino acid composition of C9ORF72 and calculated percentage values. 
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Fig. 4: Disorder residues of Guanine Nucleotide exchange C9ORF72. 


were employed to predict the 3D structures. The 
BLASTp was used for the submission of protein 
sequence against PDB for the retrieval of suitable 
templates. Only one template has been appeared 
against the query sequence (Table 2). 


Table 1: Calculated features of CQ9ORF72. 


Features Calculations 
Aliphatic index -0.069 
Instability index 50.54 

Total number of positive charge residues 50 

(Arg + Lys) 

Total number of negative charge residues 60 

(Asp + Glu) 

Total number of atoms 7683 


All the 50 structures were evaluated on the bases of 
favored region, quality factor, allowed region and 
outlier regions. A comparative graph has been plotted 
to analyze the suitable structure among all the 
predicted structures. The suitable structure was 
selected from the plotted graph (Fig. 5). 

There was variation in the quality factor values of the 
predicted structures and the selected structure of 
Guanine nucleotide exchange C9ORF72 has the 
overall quality factor of 92.38%. The Ramachandran 
plot was employed to evaluate the quality of the 
predicted structures and the selected structure has 
99.60% value of the favored region, 0.40% of allowed 
region. Interestingly, no amino acid was 


Table 2: BlastP against PDB. 
Description Max score 
Cap-Associated protein CAF20 21A 27.3 


Total score 


observed in the outlier region. The energy 
minimization of the selected structure was performed 
to improve the stereochemical properties of the 
predicted structure. The minimization was performed 
by UCSF Chimera 1.13 with 1000 steepest decent and 
1000 conjugate gradient runs (Fig. 6). 


Protein-protein molecular docking 


analyses 


Guanine Nucleotide Exchange expressed in many 
parts of the body specifically expressed in the brain. 
SMCRS8, the interacting partner of C9ORF72 was 
analyzed and observed by employing STRING and 
STITCH databases for protein-protein molecular 
docking analyses. Comparative molecular docking 
analyses were done to evaluate the binding residues. 
The docked complexes of SMCR8 and C9ORF72 
analyses predicted the interacting residues and 
analyzed on the basis of ACE [75] by utilizing 
PatchDock (Table 2). Numerous docked complexes 
were generated and then top 10 complexes with least 
ACE values were selected for further refinement 
through FireDock. The docked complexes were 
evaluated on the basis of least binding global energy. 
The molecular docking analyses suggested that 
SMCR8 and C9ORF72 have effective binding affinity 
[76]. The interacting residues were analyzed through 
UCSF Chimera (Table 3) (Fig. 7). 


Identity Accession 
5.83% 6FC3 


Query Coverage E-Value 
4% 7.84 
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Fig. 5: Graph of quality factor, favored region, allowed region and outliers region of the C9ORF72 structure prediction 


analyses evaluated through different software. 





Fig. 6: 3D structure of ALS associated protein Guanine Nucleotide Exchange C9ORF72. 
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Table 3: Protein-protein interacting residues. 
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Interacting Protein Interacting protein residues Targeting Protein Targeting protein residues 


LYS 15, GLN 21, TYR 24, 
GLN17, LEU 18, ASN 20, 

ALA 19, ARG 43, ASN 23, 
TYR 39, LYS 42, GLU 36, 
GLY 35, SER 34, ARG 33, 
LYS 5, TYR 54 


SMCR8 


Molecular docking analysis 


The FDA library of ZINC databases was screened by 
using AutoDock Vina. After screening the FDA 
library of ZINC database, top ranked 4 compounds 


Guanine PRO 467, THR 470, TYR 469, GLY 
Nucleotide 465, PHE 462, PHE 464, SER 361, 

Exchange VAL 384, PRO 389, LEU 402, SER 
C9ORF72 395, ALA 399, LEU 403, ARG 407, 


GLN 400, SER 392, ARG 329, TYR 
326, LEU 391 


were observed. Comparative molecular docking 
analyses were done by utilizing AutoDock Vina (Fig. 
7). 





Fig. 7: Protein-Protein docking analysis and the interacting residues. 


The generated docked complexes were ranked on the 
bases of least binding energy. The screened 
compounds showed similar binding domain. The 
selected compound may have potential against 
C9ORF72. The variation was observed in analyzed 


complexes having least binding energy. The 
compound ZINC131 showed least binding energy of - 
11.3 kcal/mol and 2D structure of the least binding 
affinity were develop from the ChemDraw Ultra 8.0. 
A plot of ligand-protein interactions was analyzed by 
employing UCSF Chimera (Fig. 8). 
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Fig. 8: The interactions of top ranked compound with C9ORF72. The residues analyzed from AutoDock Vina and 


UCSF Chimera. 


Fig. 9: 2D structure of least binding affinity compound from the molecular docking. 


The function of protein depends on protein structures. 
The structural bioinformatics opens the way towards 
more progress in the analyses of protein function. The 
computational method of structure prediction is less 
time consuming [77]. The era of computational 
biology which is necessary for the prediction of 
function contributes well in the way of research. 


Conclusion 


By employing computational approaches and in silico 
analyses, the analyzed molecules showed binding 
residues in conserved region by AutoDock Vina. The 


in silico molecular docking analyses proposed that 
binding residues are significant to control the 
expression of C9ORF72. The observed results 
suggested that the selected molecule could be used for 
novel chemical compounds. 
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